Extracting Linguistic Knowledge from an International Classification

نویسندگان

  • Robert Baud
  • Christian Lovis
  • Pierre-André Michel
  • Jean-Raoul Scherrer
چکیده

Automatic extraction of knowledge from large corpus of texts is an essential step toward linguistic knowledge acquisition in the medical domain. The current situation shows a lack of computer-readable large medical lexicons, with a partial exception for the English language. Moreover, multilingual lexicons with versatility for multiple languages applications are far from reach as long as only manual extraction is considered. Computer-assisted linguistic knowledge acquisition is a must. A multilingual lexicon differs from a monolingual one by the necessity to bridge the words in different languages. A kind of interlingua has to be built under the form of concepts to which the specific entries are attached. In the present approach, the authors have developed an intelligent rule-based tool in order to focus on a multilingual source of medical knowledge like the International Classification of Disease (ICD) which contains a vocabulary of some 20'000 words, translated in numerous languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Abstract Concepts Through the Lens of Linguistic and Extra-Linguistic Knowledge

The paper deals with the rigorous methods used in the research of concepts representing abstract notions like “friendship”, “love”, “hatred”, “conscience”, and “envy”. Concepts of that kind have no visible physical support in the material world except for the sound forms of the words representing them, thus causing additional difficulties in classification, research and analysis as well as stip...

متن کامل

Extracting Decision Rules from Linguistic Data Describing Economic Phenomena. The Approach Based on Decision Systems over Ontological Graphs and PSO

The aim of the paper is to present a heuristic method for extracting the most general decision rules from linguistic data describing economic phenomena included in simple decision systems over ontological graphs. Such decision systems have been proposed to deal with linguistic attribute values, describing objects of interest, which are concepts placed in semantic spaces expressed by means of on...

متن کامل

Extracting Knowledge from Text with PIKES

In this demonstration we showcase PIKES, a Semantic Role Labeling (SRL)-powered approach for Knowledge Extraction. PIKES implements a rule-based strategy that reinterprets SRL output in light of other linguistic analyses, such as dependency parsing and co-reference resolution, thus properly capturing and formalizing in RDF important linguistic aspects such as argument nominalization, frame-fram...

متن کامل

A hybrid approach for extracting semantic relations from texts

We present an approach for extracting relations from texts that exploits linguistic and empirical strategies, by means of a pipeline method involving a parser, partof-speech tagger, named entity recognition system, pattern-based classification and word sense disambiguation models, and resources such as ontology, knowledge base and lexical databases. The relations extracted can be used for vario...

متن کامل

Linguistic Knowledge-driven Approach to Chinese Comparative Elements Extraction

The BI ( 比 )-structure, which highlights a contrasting characteristic between two items, is the key comparative sentence structure in Chinese. In this paper, we explore the methods of extracting the 6 constituents of the BI-structure. Previous studies are often restricted to probabilistic classification methods, where the feature used hardly embodies linguistic knowledge, therefore unintuitive....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008